402 research outputs found
Macro action selection with deep reinforcement learning in StarCraft
StarCraft (SC) is one of the most popular and successful Real Time Strategy
(RTS) games. In recent years, SC is also widely accepted as a challenging
testbed for AI research because of its enormous state space, partially observed
information, multi-agent collaboration, and so on. With the help of annual
AIIDE and CIG competitions, a growing number of SC bots are proposed and
continuously improved. However, a large gap remains between the top-level bot
and the professional human player. One vital reason is that current SC bots
mainly rely on predefined rules to select macro actions during their games.
These rules are not scalable and efficient enough to cope with the enormous yet
partially observed state space in the game. In this paper, we propose a deep
reinforcement learning (DRL) framework to improve the selection of macro
actions. Our framework is based on the combination of the Ape-X DQN and the
Long-Short-Term-Memory (LSTM). We use this framework to build our bot, named as
LastOrder. Our evaluation, based on training against all bots from the AIIDE
2017 StarCraft AI competition set, shows that LastOrder achieves an 83% winning
rate, outperforming 26 bots in total 28 entrants
SwinGNN: Rethinking Permutation Invariance in Diffusion Models for Graph Generation
Diffusion models based on permutation-equivariant networks can learn
permutation-invariant distributions for graph data. However, in comparison to
their non-invariant counterparts, we have found that these invariant models
encounter greater learning challenges since 1) their effective target
distributions exhibit more modes; 2) their optimal one-step denoising scores
are the score functions of Gaussian mixtures with more components. Motivated by
this analysis, we propose a non-invariant diffusion model, called
, which employs an efficient edge-to-edge 2-WL message
passing network and utilizes shifted window based self-attention inspired by
SwinTransformers. Further, through systematic ablations, we identify several
critical training and sampling techniques that significantly improve the sample
quality of graph generation. At last, we introduce a simple post-processing
trick, , randomly permuting the generated graphs, which provably
converts any graph generative model to a permutation-invariant one. Extensive
experiments on synthetic and real-world protein and molecule datasets show that
our SwinGNN achieves state-of-the-art performances. Our code is released at
https://github.com/qiyan98/SwinGNN
Macro action selection with deep reinforcement learning in StarCraft
StarCraft (SC) is one of the most popular and successful Real Time Strategy
(RTS) games. In recent years, SC is also considered as a testbed for AI
research, due to its enormous state space, hidden information, multi-agent
collaboration and so on. Thanks to the annual AIIDE and CIG competitions, a
growing number of bots are proposed and being continuously improved. However, a
big gap still remains between the top bot and the professional human players.
One vital reason is that current bots mainly rely on predefined rules to
perform macro actions. These rules are not scalable and efficient enough to
cope with the large but partially observed macro state space in SC. In this
paper, we propose a DRL based framework to do macro action selection. Our
framework combines the reinforcement learning approach Ape-X DQN with
Long-Short-Term-Memory (LSTM) to improve the macro action selection in bot. We
evaluate our bot, named as LastOrder, on the AIIDE 2017 StarCraft AI
competition bots set. Our bot achieves overall 83% win-rate, outperforming 26
bots in total 28 entrants
Curriculum Temperature for Knowledge Distillation
Most existing distillation methods ignore the flexible role of the
temperature in the loss function and fix it as a hyper-parameter that can be
decided by an inefficient grid search. In general, the temperature controls the
discrepancy between two distributions and can faithfully determine the
difficulty level of the distillation task. Keeping a constant temperature,
i.e., a fixed level of task difficulty, is usually sub-optimal for a growing
student during its progressive learning stages. In this paper, we propose a
simple curriculum-based technique, termed Curriculum Temperature for Knowledge
Distillation (CTKD), which controls the task difficulty level during the
student's learning career through a dynamic and learnable temperature.
Specifically, following an easy-to-hard curriculum, we gradually increase the
distillation loss w.r.t. the temperature, leading to increased distillation
difficulty in an adversarial manner. As an easy-to-use plug-in technique, CTKD
can be seamlessly integrated into existing knowledge distillation frameworks
and brings general improvements at a negligible additional computation cost.
Extensive experiments on CIFAR-100, ImageNet-2012, and MS-COCO demonstrate the
effectiveness of our method. Our code is available at
https://github.com/zhengli97/CTKD.Comment: AAAI 202
- …